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9t4ditnt-typi  algorithms,  as  well  as  stochastic  ep- 
prexistUoai  algorithms.  Cue  to  spscs  considerations, 
we  only  discuss  hers  the  nature  of  the  results.  Exact 
statements  and  the  proofs  may  toe  found  in  (Tsitsiklis, 
1983)  and  in  forthcoming  publications.  Preliminary 
versions  of  these  results  appear  in  (Tsitsiklis, 
Bertsekas  end  Athens,  1983]. 

To  discuss  the  nature  of  the  convergence  conditions, 
we  distinguish  cases  t 

A.  Constant  Step-Site  Algorithms  (c.q.  gradient 
algorithm) 

for  such  algorithms  it  has  toeen  shown  that  conver¬ 
gence  to  the  centralised  optimw  ie  obtained,  provided 
thet  the  time  between  consecutive  cn— uni  cations 
between  pairs  of  decision  makers,  plus  the  counlca- 
tion  delay,  is  bounded  by  an  appropriate  constant. 
Moreover,  the  larger  the  step-sixe  (i.e.  the  constant 
din  aquation  (1.2)),  the  smaller  the  above  mentioned 
constant.  The  letter  statement  admits  the  appealing 
interpretation  thet  the  larger  the  updates  by  each 
decision  maker,  the  more  frequent  complications  ere 
required. 

8.  Decreasing  Step-Size  Algorithms  (c.q.  stochastic 
approximation  algorithms) 

la  this  case,  the  algorithm  becomes  slower  and 
slower  as  the  time  index  increases.  This  allows  the 
process  of  rnmtmi rations  to  become  progressively 
slower,  as  well.  In  particular,  it  has  been  shown  thet 
convergence  to  the  centralised  optima  is  obtained 
even  if  the  time  between  consecutive  co— unicetiona 
between  pairs  of  decision  makers,  plus  communication 
delays,  increase  without  bound,  as  the  algorithm 
proceeds,  provided  thet  the  rate  of  increase  is  not 
too  fast. 

3.  A  Distributed  Gradient  Algorithm 

In  this  section  we  consider  e  rather  simple  distri¬ 
buted  algorithm  for  minimi  ring  an  additive  cost  func¬ 
tion.  Due  to  the  simplicity  of  the  algorithm,  we  ere 
able  to  derive  convergence  conditions  which  ere  gen¬ 
erally  tighter  than  the  general  conditions  discussed 
in  the  previous  sections.  It  will  be  seen  shortly, 
that  these  conditions  admit  appealing  organisational 
interpretations  • 

the  conceptual  motivation  behind  our  approach  is 
based  on  the  following  statement: 

If  an  optimisation  problem  consists  of  sub- 
problems,  each  subproblem  being  assigned  to 
a  different  decision  maker,  then  the  frequency 
of  communications  between  a  pair  of  decision 
makers  should  reflect  the  degree  by  which  their 
subproblem a  are  coupled. 

The  above  statement  is  fairly  hard  to  capture  math¬ 
ematically.  This  is  accomplished,  however,  to  some 
extent,  by  the  model  and  the  results  of  this  section. 

Let  J:  be  a  cost  function  to  be  minimi reda 

with  a  special  structures 

M  . 

3  (x)  ■  J  (x_ , . . .  ,xM)  ■  I  0  (x. ...  *  ,x^J  (3.1) 
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where  J  :  9  9.  So  far,  equation  (3.1)  does  not  im¬ 

pose  any  restriction  on  J:  we  will  be  Interested,  how¬ 
ever,  in  the  cese  where,  for  each  i,  depends  on 

and  only  a  few  more  components  of  xt  consequently,  the 

Hessian  matrix  of  each  J*  is  sparse. 

Me  view  J*  as  a  cost  directly  faced  by  the  1-th 
decision  maker.  This  decision  makar  is  free  to  fix 
or  update  the  component  x^,  but  his  cost  also  depends 

on  a  few  interaction  variables  (other  components  of  x) 


which  arc  under  the  authority  of  other  decision  makers. 

We  may  visualise  the  structure  of  the  interactions 
by  means  of  a  directed  graph  G*(V,E): 

(i)  The  set  V  of  nodes  cf  C  is  V»{1,...,m) 

(ii)  The  set  of  edges  Z  of  the  graph  is 


E*{li,j);  depends 


on  x.) 


(3.2) 


Since  we  axe  interested  in  the  fine  structure  of  the 
optimisation  problem,  we  quantify  the  interactions 
between  subproblems  by  assuming  that  the  following 
bounds  are  available: 
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where  (without  loss  of  generality) 
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A  synchronous  distributed  gradient- type  algorithm  for 
this  problem  could  be: 

1.  For  each  (i,j)C  E,  decision  maker  j  evaluates 
SJ* 


A*(n) 


(x(n) ) 


(3.5) 


2.  For  each  (i,j)€r  E,  decision  maker  j  transmits  A^  In) 
to  decision  maker  i. 

3.  Each  decision  maker  1  updates  according  to 

H  , 

xi(B*l)  -  xi<a)  -  [  \*(a)  (3.6' 

4.  For  each  (i, j)€  E,  decision  maker  1  transmits 
x^ln+1)  to  decision  maker  j. 

Me  now  consider  the  asynchronous  version  of  the 

•bov.  xlooritho.  i*t  xi(n)-  <x*(n) , . . .  ,x£(n))  tenot*  a 

decision  vector  (element  of  PM)  stored  in  the  memory 
Of  decision  maker  i  at  time  n.  We  also  assuse  that 
each  decision  maker  i  stores  in  his  memory  another  vec¬ 
tor  (A*  (n)  , . . .  ,A*  (n) )  with  his  estimates  of 

dJ1 
3x. 


"3x. 


On  like  the  synchronous  algorithm,  we 

"i  *"*i 

do  not  require  that  a  message  be  transmitted  at  each 
time  stage  and  we  allow  communication  delays.  So  let: 

p**  In)  •  the  time  that  a  message  with  a  value 
of  was  sent  from  processor  k  to 

processor  i,  and  this  was  the  last 
such  message  received  no  leter  then 
time  n. 

Xi 

4  (n)  *  the  time  that  a  message  with  a  value  of 

3jk 

Tr-  was  sent  from  processor  k  to  processor 
•i 

1,  and  this  was  the  last  such  aw  stage 
received  no  later  than  time  a. 

For  consistency  of  notation  we  let 

ii,  .  ii,  % 
p  (ft)  •  q  (a>«a,  y ,  y 
*  "  n 

With  the  above  definitions,  we  have: 
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,  Equations  (3.»),  (3.9)  toqsthsr  with 
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specify  ceepletely  the  asynchronous  distributed  algo¬ 
rithm  of  inters st. 

bet  us  now  ttsuM  that  the  time  between  consecutive 
coamuai  cations  and  the  consult! cation  delays  are  bounded. 
We  allow,  however,  these  bounds  to  be  different  for 
each  pair  of  processors  and  each  type  of  message: 


Assumption:  For  some 

ik  ik 
constants  F  *  Q  , 

• 

ik  lk 

n-P  <  p  (a)  <  n. 

V(i,k)€  E,  V* 

(3.11) 

_^ik  ^  ik  w 
n-0  £  q  <n)<  n* 

ft 

V(k,i)C  E,  V  , 
n 

(3.12) 

Note  that  we  nay  let  F1*  • 


0. 


The  following  result  states  that  the  algorithm  con- 


ik  Dc 

verges  if  ?  and  Q  are  not  too  large  coopered  to  the 
degree  of  cmipling  between  different  subproblems. 
iTsltsikl is,  1963) . 


Theorem  3.1:  Suppose  that  for  each  i 


Y  >I*14*  I 

j-1  13  k-1  j-1 

Let  z(n)-(x*(n)  ,x*(n)  * . ..  ,x*(n) ) .  Then, 
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We  close  this  section  with  e  few  remarks: 


(3.14) 


1.  The  bounds  provided  by  (3.13)  are  sufficient  for 
convergence  but  not  necessary.  It  is  known  (Bertsekas, 
1962b)  that  a  decentralised  algorithm  of  a  similar  type 
may  converga  in  certain  special  cases,  even  if  the 

are  held  fixed,  while  the  bounds  ***,  are  allowed 
to  be  arbitrarily  large,  so,  the  gap  between  the  suf¬ 
ficient  conditions  (3.13)  and  the  necessary  conditions 
nay  be  substantial,  rurther  research  should  narrow  this 
W* 

2.  The  convergence  rate  of  the  distributed  a  Igor  it  ha 
should  be  expected  to  deteriorate  as  the  bounds  P  , 

increase.  A  characterization  of  the  convergence 
rate,  however,  seems  to  be  a  fairly  hard  problem. 


.  4-  Towards  Organizational  Design 

Suppose  that  we  have  a  divisionalized  organization 
and  that  the  objective  of  the  organization  is  to  mini¬ 
mize  s  cost  J  which  is  the  sum  of  the  costs  J*  faced  by 
each  division.  To  each  division,  these  corresponds  a 
decision  maker  which  is  knowledgeable  enough  about  the 
structure  of  the  problem  he  is  facing,  to  the  extent 
that  given  a  tentative  decision  he  is  able  to  change  his 
decision  la  a  direction  of  improvement.  Moreover,  sup¬ 
pose  that  the  division  are  interacting  in  some  way;  that 
is*  the  decision  of  one  decision  maker  may  affect  the 
costs  of  another  division.  Suppose*  finally*  that  de¬ 
cision  makers  regularly  update  their  decisions  taking 
into  account  the  decisions  of  other  decision  makers  and 
the  effects  of  their  own  decisions  on  other  divisions. 
Messages  are  being  exchanged  from  time  to  time  carrying 
the  required  information.  Clearly,  the  mathematical 
modal  of  Section  3  may  be  viewed  as  a  model  of  the 
above  situation. 

A  natural  question  raised  by  the  above  described 
situation  concerns  the  design  of  the  information  flows 
within  tha  organization,  so  aa  to  guarantee  smooth 


operation,  but  this  is  precisely  the  issue  addressed 
by  Theorem  3.1:  the  bounds  may  be  thought  as  qum>- 

tifying  the  degree  of  coupling  between  divisions;  the 

bouns  F1**  describe  the  frequency  of  cosmunicetim 
and  J  represents  the  speed  of  adjustment.  Theorem 

3.1  links  all  these  quantities  together  and  pro^idot 
some  conditions  for  smooth  operation,  whereby  com¬ 
munication  rates  are  prescribed  In  terms  of  the  degree 
of  coupling. 

We  may  conclude  that  the  approach  of  Section  3  may 
form  the  basis  of  a  procedure  for  designing  an  organiza¬ 
tional  structure,  or  -more  precisely-  the  Information 
flows  within  an  organization.  Of  course,  Iheoroi  3.1 
does  not  exhaust  the  subject.  In  particular,  Tfcecrea 
3.1  suggests  a  set  of  ieesible  organizational  struc¬ 
tures,  with  generally  different  convergence  rates. 

There  remains  the  problem  of  choosing  a  “best"  such 
structure . 

It.  is  also  conceivable  that  the  structure  of  the 
underlying  optimization  problem  slowly  changes  with 
time,  end  so  do  the  bounds  but  in  s  time  scale 

slower  than  the  time  scale  of  the  adjustment  process. 

In  such  e  case,  the  bounds  should  also  change 

This  lesds  to  e  natural  two- level  organizational  struc¬ 
ture:  At  the  lower  level,  we  have  e  set  of  decision 
makers  continuously  adjusting  their  decisions  end 
exchanging  messages.  At  s  higher  level,  we  have  e 
supervisor  who  monitors  changes  in  and  accordingly 

instructs  the  low-level  decision  makers  to  adjust  their 
coonuni cation  rates.  Note  that  the  supervisor  does  not 
need  to  know  the  details  of  the  cost  function;  he  only 
needs  to  know  the  degree  of  coupling  between  divisions. 
This  seems  to  reflect  the  actual  structure  of  existing 
organizations,  low  level  decision  makers  are  "experts* 
on  the  problems  feeing  them,  while  higher  level  decision 
makers  only  know  certain  structural  properties  of  the 
overall  problem  and  make  certain  global  decision*  e.g. 
setting  the  comunication  rates. 

Event-Driven  Communications 


We  now  discuss  e  slightly  different  "mode  of  opera¬ 
tion"  for  the  asynchronous  algorithm,  which  has  also 
clear  organizational  implications.  It  should  be  clear 
that  coonuni  cations  are  required  by  the  distributed 
algorithm  so  that  decision  makers  are  informed  of 
changes  oc curing  elsewhere  in  the  system.  Moreover, 
ik  ik 

the  bounds  ®  ,  Q  of  Section  3  effectively  guarantee 

that  e  message  is  being  sent  whenever  a  substantial 
change  occurs.  The  same  effect,  however,  could  be  ac¬ 
complished  without  imposing  bounds  on  the  time  between 
consecutive  message  transmissions :  each  decision  maker 
could  just  monitor  his  decisions  end  inform  the  others 
whenever  e  substantial  change  occurs.  It  seems  that 
the  latter  approach  could  result  co  significant  savings 
in  the  number  of  messages  being  exchanged,  but  further 
research  is  needed  on  this  topic. 

.  5.  Conclusions 


A  large  class  of  deterministic  and  stochastic  iter¬ 
ative  optimization  algorithms  admit  natural  distribute 
asynchronous  Implementations.  Such  implementations 
(when  compared  to  their  synchronous  counterparts)  may 
retain  the  desired  convergence  properties*  while  reduc¬ 
ing  coanuni cation  requirements  and  removing  bottlenecks 
caused  by  ream  uni  cat  Ion  delays. 

We  have  focused  on  a  deterministic  gradient-type 
algorithm  for  an  additive  cost  function  and  we  have 
shown  that  the  conunication  requirements  depend  in  a 
natural  way  on  the  degree  of  coupling  between  different 
components  of  the  cost  function.  This  approach  addresses 
the  basic  problem  of  designing  the  information  flows 
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